Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support partial downloads #1020

Merged
merged 8 commits into from
Apr 8, 2017
Merged

Conversation

jelford
Copy link
Contributor

@jelford jelford commented Mar 31, 2017

This PR should close #889

A couple of implementation details:

Only added support for the curl backend; previously discussed that there's an intention to get rid of rustup's own download code, and the default feature-set uses curl anyway, so hopefully this is okay.

Added new testing to the download crate - while it's there, it makes sense to have a test. Since using curl's "resume" functionality, I figured it's probably fine to just file:// urls for test cases. Previously tested using a small hyper-based http server, but that feels like overkill.

For hashing files, I've set the buffer size to 2^15 - just because that's what strace tells me is used by sha256sum on my local PC. It seems much slower than that command though, and it's not obvious why, so maybe I've done something silly here.

Finally, and maybe most controversially, I haven't done anything about cleaning up aborted partials. I don't really know when a good time is to do this, but a couple of suggestions that I'd be happy to implement:

  • Every run, just check the download cache for any files > 7 days old and smoke them
  • On self-update, as that seems like a natural time for generic "maintenance" sorts of operations

I mentioned in my last PR, but the same disclaimer: I haven't written much rust, so I fully expect you will see some problems (also very happy to accept style criticisms). I accidentally ran a rustfmt on some things so apologies for the noise (can revert but... maybe it's worth having anyway?).

* Adds support only for the Curl backend, which is the default anyway.

* In order to distinguish between a file that has been fully downloaded
(but not used yet) and should therefore be hash-checked vs. those that
require more data, store partials with a .partial extension.

* Adds a simple http-server to rustup-mock, to allow the download module
to be properly tested. It's not clear how to easily emulate a server
that stops half-way without that. The tests for the overall
download-resumption functionality should be fairly re-usable if we migrate
to another download solution in the future (e.g. in rust-lang#993)

* Don't bother with resumption for meta-data files, since they're likely
to go out of date anyway.
Allows the removal of http test server, and simplifies test cases as we
can just read/write files on disk, like for existing dist tests
…hes, and make consistent with partial download code.
@jelford
Copy link
Contributor Author

jelford commented Mar 31, 2017

Not sure about that travis error: "couldn't find crate for std." I don't know whether there's a way to just give it another kick?

@Diggsey
Copy link
Contributor

Diggsey commented Mar 31, 2017

I restarted that job. The appveyor issues are legit though.

I don't know why I put it there in the first place
@jelford
Copy link
Contributor Author

jelford commented Apr 1, 2017

Yes, that could definitely never work. Don't know why I put the conditional there. I can't check my work though, don't have a windows box, but think I've just pushed a fix.

@Diggsey
Copy link
Contributor

Diggsey commented Apr 1, 2017

Thanks @jelford
Could you revert the formatting changes from this PR - the rust style guidelines are currently in the process of being changed (away from visual indentation by the look of things), so I don't think we should reformat anything at this time, and even if we did, it should be separate from functional changes to the code.

While it would be nice to move the download code out of rustup, I'm not sure that the intent is to switch to curl though? I'm not necessarily opposed to adding this kind of feature just for the curl backend though if it will help people on poor connections - @brson thoughts?

@jelford
Copy link
Contributor Author

jelford commented Apr 1, 2017

Thanks for the quick response @Diggsey.
I've reverted those formatting changes, sorry for the noise.

On switching to curl, I just meant to observe that this is the default, so most people should get the benefit right away. I'm hoping 90% of this goes away as part of #993, so I wasn't keen to spend too long getting it into the hyper-backend.

As a side note, I only took a quick look and I don't think reqwest currently has support for resume as a 1st-class thing. I was thinking it's easy enough just to set the appropriate header, but need to think about the error case where the server doesn't support it. libcurl will just bomb out when it gets a response without the Content-Range header. Is that actually fine? Do all the mirrors support it? It works for me locally, but I don't know whether it's S3 everywhere or something else in different parts of the world. If not, it would be a shame to break the whole thing for what is essentially a bandwidth optimization.

Options in that case would be:

  • add an environment variable to de-activate resume requests
  • fail gracefully from the specific curl error and fall back to downloading the whole file.

The former is simple, but not as nice a UI.

@brson
Copy link
Contributor

brson commented Apr 5, 2017

While it would be nice to move the download code out of rustup, I'm not sure that the intent is to switch to curl though? I'm not necessarily opposed to adding this kind of feature just for the curl backend though if it will help people on poor connections - @brson thoughts?

My intent is to switch to reqwest, assuming that reqwest uses native-tls, and someday gets support for rustls. The rustup download crate was supposed to basically be reqwest, but at this point it seems best to do the hyper/rustls transition in the ecosystem, not in rustup. Adding resume support just means we'll have to make reqwest support that too (or otherwise create a crate that does what reqwest does with resume support).

@brson
Copy link
Contributor

brson commented Apr 5, 2017

Good tests, thanks @jelford.

@brson
Copy link
Contributor

brson commented Apr 5, 2017

Is that actually fine? Do all the mirrors support it? It works for me locally, but I don't know whether it's S3 everywhere or something else in different parts of the world. If not, it would be a shame to break the whole thing for what is essentially a bandwidth optimization.

I think we can not assume much about the servers, because people will at some point host their own rustup mirrors.

@brson
Copy link
Contributor

brson commented Apr 5, 2017

Patch looks good to me, r=me if @Diggsey is happy.

@Diggsey
Copy link
Contributor

Diggsey commented Apr 5, 2017

@bors r=brson

@bors
Copy link
Contributor

bors commented Apr 5, 2017

📌 Commit c63b275 has been approved by brson

@bors
Copy link
Contributor

bors commented Apr 5, 2017

⌛ Testing commit c63b275 with merge 8c481c8...

bors added a commit that referenced this pull request Apr 5, 2017
Support partial downloads

This PR should close  #889

A couple of implementation details:

Only added support for the curl backend; previously discussed that there's an intention to get rid of rustup's own download code, and the default feature-set uses curl anyway, so hopefully this is okay.

Added new testing to the download crate - while it's there, it makes sense to have a test. Since using curl's "resume" functionality, I figured it's probably fine to just file:// urls for test cases. Previously tested using a small hyper-based http server, but that feels like overkill.

For hashing files, I've set the buffer size to 2^15 - just because that's what strace tells me is used by `sha256sum` on my local PC. It seems much slower than that command though, and it's not obvious why, so maybe I've done something silly here.

Finally, and maybe most controversially, I haven't done anything about cleaning up aborted partials. I don't really know when a good time is to do this, but a couple of suggestions that I'd be happy to implement:
* Every run, just check the download cache for any files > 7 days old and smoke them
* On self-update, as that seems like a natural time for generic "maintenance" sorts of operations

I mentioned in my last PR, but the same disclaimer: I haven't written much rust, so I fully expect you will see some problems (also very happy to accept style criticisms). I accidentally ran a `rustfmt` on some things so apologies for the noise (can revert but... maybe it's worth having anyway?).
@bors
Copy link
Contributor

bors commented Apr 5, 2017

💔 Test failed - status-appveyor

@brson
Copy link
Contributor

brson commented Apr 8, 2017

Windows flakiness

@brson brson merged commit 4cdbe1b into rust-lang:master Apr 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support resume of partial downloads
4 participants